11 resultados para 060102 Bioinformatics

em Helda - Digital Repository of University of Helsinki


Relevância:

10.00% 10.00%

Publicador:

Resumo:

Malignant mesothelioma (MM) is a rare, usually incurable, disease mainly caused by former exposure to asbestos. Even though MM has a strong etiological link, genetic factors may play a role, since not all cases can be linked to former asbestos exposure. This thesis focuses on lung diseases, mainly malignant mesothelioma (MM), and idiopathic pulmonary fibrosis (IPF), which resembles asbestosis. The specific asbestos-related pathways associated with malignant as well as non-malignant lung diseases, still need to be clarified. Since most patients diagnosed with MM or asbestosis/fibrosis have a dismal prognosis and few therapeutic options are available, early diagnosis and better understanding of the disease pathogenesis are of the utmost importance. The first objective of this thesis was to identify asbestos specific differentially expressed genes. This was approached by using high-resolution gene expression arrays, and three different human lung cell lines, as well as with three different bioinformatics approaches. Since the first study aimed to elucidate potential early changes, the second study was used to screen DNA copy number changes in MM tumour samples. This was performed using genome wide microarrays for identification of DNA copy number changes characterstic for MM. Study III focused on the role of gremlin in the regulation of bone morphogenetic protein (BMPs) in IPF. Further studies were conducted in asbestos-exposed cell cultures as well as in an asbestos-induced mouse model. Furthermore, GATA-6 was studied in MM and metastatic pleural adenocarcinoma. The GATA transcription factors are important during embryonic development, but their role in cancer is still unclear. GATA-6 is a co-factor/target of thyroid transcription factor 1 (TTF-1), which is used in differential diagnostics of pleural MM and adenocarcinoma. Bioinformatics probed the genes and biological processes ordered in terms of significance, clusters, and highly enriched chromosomal regions. The study revealed several already identified targets, produced new ideas about genes which are central for asbestos exposure, as well as provided supplementary data for researchers to check their own novel findings or ideas. The analysis revealed DNA copy number changes characteristic for MM tumors. The most common regions of loss were detected in 1p, 3p, 6q, 9p, 13, 14, and 22, and gains at 17q. The histological features in asbestosis and IPF are very similar, wherefore IPF can be studied in asbestos models. The BMP antagonist gremlin was up-regulated by asbestos exposure in human epithelial cell lines, which was also observed in Study I. The transforming growth factor (TGF) -β and BMP expression and signaling activities were measured from murine and human fibrotic lungs. BMP-7 signaling was down-regulated in response to up-regulation of gremlin, and restoration of BMP-7 signaling prevented progression of fibrosis in mice. Therefore, the study suggests that the restoration of BMP-7 signaling in fibrotic lung could potentially aid in the treatment of IPF patients. Study IV revealed that GATA-6 was strongly expressed in the majority of the MM cases, and correlated statistically significant with longer survival in subgroups of MM.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Helicobacter pylori infection is a risk factor for gastric cancer, which is a major health issue worldwide. Gastric cancer has a poor prognosis due to the unnoticeable progression of the disease and surgery is the only available treatment in gastric cancer. Therefore, gastric cancer patients would greatly benefit from identifying biomarker genes that would improve diagnostic and prognostic prediction and provide targets for molecular therapies. DNA copy number amplifications are the hallmarks of cancers in various anatomical locations. Mechanisms of amplification predict that DNA double-strand breaks occur at the margins of the amplified region. The first objective of this thesis was to identify the genes that were differentially expressed in H. pylori infection as well as the transcription factors and signal transduction pathways that were associated with the gene expression changes. The second objective was to identify putative biomarker genes in gastric cancer with correlated expression and copy number, and the last objective was to characterize cancers based on DNA copy number amplifications. DNA microarrays, an in vitro model and real-time polymerase chain reaction were used to measure gene expression changes in H. pylori infected AGS cells. In order to identify the transcription factors and signal transduction pathways that were activated after H. pylori infection, gene expression profiling data from the H. pylori experiments and a bioinformatics approach accompanied by experimental validation were used. Genome-wide expression and copy number microarray analysis of clinical gastric cancer samples and immunohistochemistry on tissue microarray were used to identify putative gastric cancer genes. Data mining and machine learning techniques were applied to study amplifications in a cross-section of cancers. FOS and various stress response genes were regulated by H. pylori infection. H. pylori regulated genes were enriched in the chromosomal regions that are frequently changed in gastric cancer, suggesting that molecular pathways of gastric cancer and premalignant H. pylori infection that induces gastritis are interconnected. 16 transcription factors were identified as being associated with H. pylori infection induced changes in gene expression. NF-κB transcription factor and p50 and p65 subunits were verified using elecrophoretic mobility shift assays. ERBB2 and other genes located in 17q12- q21 were found to be up-regulated in association with copy number amplification in gastric cancer. Cancers with similar cell type and origin clustered together based on the genomic localization of the amplifications. Cancer genes and large genes were co-localized with amplified regions and fragile sites, telomeres, centromeres and light chromosome bands were enriched at the amplification boundaries. H. pylori activated transcription factors and signal transduction pathways function in cellular mechanisms that might be capable of promoting carcinogenesis of the stomach. Intestinal and diffuse type gastric cancers showed distinct molecular genetic profiles. Integration of gene expression and copy number microarray data allowed the identification of genes that might be involved in gastric carcinogenesis and have clinical relevance. Gene amplifications were demonstrated to be non-random genomic instabilities. Cell lineage, properties of precursor stem cells, tissue microenvironment and genomic map localization of specific oncogenes define the site specificity of DNA amplifications, whereas labile genomic features define the structures of amplicons. These conclusions suggest that the definition of genomic changes in cancer is based on the interplay between the cancer cell and the tumor microenvironment.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The analysis of sequential data is required in many diverse areas such as telecommunications, stock market analysis, and bioinformatics. A basic problem related to the analysis of sequential data is the sequence segmentation problem. A sequence segmentation is a partition of the sequence into a number of non-overlapping segments that cover all data points, such that each segment is as homogeneous as possible. This problem can be solved optimally using a standard dynamic programming algorithm. In the first part of the thesis, we present a new approximation algorithm for the sequence segmentation problem. This algorithm has smaller running time than the optimal dynamic programming algorithm, while it has bounded approximation ratio. The basic idea is to divide the input sequence into subsequences, solve the problem optimally in each subsequence, and then appropriately combine the solutions to the subproblems into one final solution. In the second part of the thesis, we study alternative segmentation models that are devised to better fit the data. More specifically, we focus on clustered segmentations and segmentations with rearrangements. While in the standard segmentation of a multidimensional sequence all dimensions share the same segment boundaries, in a clustered segmentation the multidimensional sequence is segmented in such a way that dimensions are allowed to form clusters. Each cluster of dimensions is then segmented separately. We formally define the problem of clustered segmentations and we experimentally show that segmenting sequences using this segmentation model, leads to solutions with smaller error for the same model cost. Segmentation with rearrangements is a novel variation to the segmentation problem: in addition to partitioning the sequence we also seek to apply a limited amount of reordering, so that the overall representation error is minimized. We formulate the problem of segmentation with rearrangements and we show that it is an NP-hard problem to solve or even to approximate. We devise effective algorithms for the proposed problem, combining ideas from dynamic programming and outlier detection algorithms in sequences. In the final part of the thesis, we discuss the problem of aggregating results of segmentation algorithms on the same set of data points. In this case, we are interested in producing a partitioning of the data that agrees as much as possible with the input partitions. We show that this problem can be solved optimally in polynomial time using dynamic programming. Furthermore, we show that not all data points are candidates for segment boundaries in the optimal solution.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This thesis which consists of an introduction and four peer-reviewed original publications studies the problems of haplotype inference (haplotyping) and local alignment significance. The problems studied here belong to the broad area of bioinformatics and computational biology. The presented solutions are computationally fast and accurate, which makes them practical in high-throughput sequence data analysis. Haplotype inference is a computational problem where the goal is to estimate haplotypes from a sample of genotypes as accurately as possible. This problem is important as the direct measurement of haplotypes is difficult, whereas the genotypes are easier to quantify. Haplotypes are the key-players when studying for example the genetic causes of diseases. In this thesis, three methods are presented for the haplotype inference problem referred to as HaploParser, HIT, and BACH. HaploParser is based on a combinatorial mosaic model and hierarchical parsing that together mimic recombinations and point-mutations in a biologically plausible way. In this mosaic model, the current population is assumed to be evolved from a small founder population. Thus, the haplotypes of the current population are recombinations of the (implicit) founder haplotypes with some point--mutations. HIT (Haplotype Inference Technique) uses a hidden Markov model for haplotypes and efficient algorithms are presented to learn this model from genotype data. The model structure of HIT is analogous to the mosaic model of HaploParser with founder haplotypes. Therefore, it can be seen as a probabilistic model of recombinations and point-mutations. BACH (Bayesian Context-based Haplotyping) utilizes a context tree weighting algorithm to efficiently sum over all variable-length Markov chains to evaluate the posterior probability of a haplotype configuration. Algorithms are presented that find haplotype configurations with high posterior probability. BACH is the most accurate method presented in this thesis and has comparable performance to the best available software for haplotype inference. Local alignment significance is a computational problem where one is interested in whether the local similarities in two sequences are due to the fact that the sequences are related or just by chance. Similarity of sequences is measured by their best local alignment score and from that, a p-value is computed. This p-value is the probability of picking two sequences from the null model that have as good or better best local alignment score. Local alignment significance is used routinely for example in homology searches. In this thesis, a general framework is sketched that allows one to compute a tight upper bound for the p-value of a local pairwise alignment score. Unlike the previous methods, the presented framework is not affeced by so-called edge-effects and can handle gaps (deletions and insertions) without troublesome sampling and curve fitting.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The first glycyl radical in an enzyme was described 20 years ago and since then the family of glycyl radical enzymes (GREs) has expanded to include enzymes catalysing five chemically distinct reactions. The type enzymes of the family, anaerobic ribonucleotide reductase (RNRIII) and pyruvate formate lyase (PFL) had been studied long before it was known that they are GREs. Spectroscopic measurements on the radical and an observation that exposure to oxygen irreversibly inactivates the enzymes by cleavage of the protein proved that the radical is located on a particular glycine residue, close to the C-terminus of the protein. Both anaerobic RNRIII and PFL, are important for many anaerobic and facultative anaerobic bacteria as RNRIII is responsible for the synthesis of DNA precursors and PFL catalyses a key metabolic reaction in glycolysis. The crystal structures of both were solved in 1999 and they revealed that, although the enzymes do not share significant sequence identity, they share a similar structure - the radical site and residues necessary for catalysis are buried inside a ten stranded $\ualpha $/$\ubeta $-barrel. GREs are synthesised in an inactive form and are post-translationally activated by an activating enzyme which uses S-adenosyl methionine and an iron-sulphur cluster to generate the radical. One of the goals of this thesis work was to crystallise the activating enzyme of PFL. This task is challenging as, like GREs, the activating component is inactivated by oxygen. The experiments were therefore carried out in an oxygen free atmosphere. This is the first report of a crystalline GRE activating enzyme. Recently several new GREs have been characterised, all sharing sequence similarity to PFL but not to RNRIII. Also, the genome sequencing projects have identified many PFL-like GREs of unknown function, usually annotated as PFLs. In the present thesis I describe the grouping of these PFL family enzymes based on the sequence similarity and analyse the conservation patterns when compared to the structure of E. coli PFL. Based on this information an activation route is proposed. I also report a crystal structure of one of the PFL-like enzymes with unknown function, PFL2 from Archaeoglobus fulgidus. As A. fulgidus is a hyperthermophilic organism, possible mechanisms stabilising the structure are discussed. The organisation of an active site of PFL2 suggests that the enzyme may be a dehydratase. Keywords: glycyl radical, enzyme, pyruvate formate lyase, x-ray crystallography, bioinformatics

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The glomerular epithelial cells and their intercellular junctions, termed slit diaphragms, are essential components of the filtration barrier in the kidney glomerulus. Nephrin is a transmembrane adhesion protein of the slit diaphragm and a signalling molecule regulating podocyte physiology. In congenital nephrotic syndrome of the Finnish type, mutation of nephrin leads to disruption of the permeability barrier and leakage of plasma proteins into the urine. This doctoral thesis hypothesises that novel nephrin-associated molecules are involved in the function of the filtration barrier in health and disease. Bioinformatics tools were utilized to identify novel nephrin-like molecules in genomic databases, and their distribution in the kidney and other tissues was investigated. Filtrin, a novel nephrin homologue, is expressed in the glomerular podocytes and, according to immunoelectron microscopy, localizes at the slit diaphragm. Interestingly, the nephrin and filtrin genes, NPHS1 and KIRREL2, locate in a head-to-head orientation on chromosome 19q13.12. Another nephrin-like molecule, Nphs1as was cloned in mouse, however, no expression was detected in the kidney but instead in the brain and lymphoid tissue. Notably, Nphs1as is transcribed from the nephrin locus in an antisense orientation. The glomerular mRNA and protein levels of filtrin were measured in kidney biopsies of patients with proteinuric diseases, and marked reduction of filtrin mRNA levels was detected in the proteinuric samples as compared to controls. In addition, altered distribution of filtrin in injured glomeruli was observed, with the most prominent decrease of the expression in focal segmental glomerulosclerosis. The role of the slit diaphragm-associated genes for the development of diabetic nephropathy was investigated by analysing single nucleotide polymorphisms. The genes encoding filtrin, densin-180, NEPH1, podocin, and alpha-actinin-4 were analysed, and polymorphisms at the alpha-actinin-4 gene were associated with diabetic nephropathy in a gender-dependent manner. Filtrin is a novel podocyte-expressed protein with localization at the slit diaphragm, and the downregulation of filtrin seems to be characteristic for human proteinuric diseases. In the context of the crucial role of nephrin for the glomerular filter, filtrin appears to be a potential candidate molecule for proteinuria. Although not expressed in the kidney, the nephrin antisense Nphs1as may regulate the expression of nephrin in extrarenal tissues. The genetic association analysis suggested that the alpha-actinin-4 gene, encoding an actin-filament cross-linking protein of the podocytes, may contribute to susceptibility for diabetic nephropathy.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Innate immunity and host defence are rapidly evoked by structurally invariant molecular motifs common to microbial world, called pathogen associated molecular patterns (PAMPs). In addition to PAMPs, endogenous molecules released in response to inflammation and tissue damage, danger associated molecular patterns (DAMPs), are required for eliciting the response. The most important PAMPs of viruses are viral nucleic acids, their genome or its replication intermediates, whereas the identity and characteristics of virus infection-induced DAMPs are poorly defined. PAMPs and DAMPs engage a limited set of germ-line encoded pattern recognition receptors (PRRs) in immune and non-immune cells. Membrane-bound Toll-like receptors (TLRs), cytoplasmic retinoic acid inducible gene-I (RIG-I)-like receptors (RLRs) and nucleotide-binding oligomerization domain-like receptor (NLRs) are important PRRs involved in the recognition of the molecular signatures of viral infection, such as double-stranded ribonucleic acids (dsRNAs). Engagement of PRRs results in local and systemic innate immune responses which, when activated against viruses, evoke secretion of antiviral and pro-inflammatory cytokines, and programmed cell death i.e., apoptosis of the virus-infected cell. Macrophages are the central effector cells of innate immunity. They produce significant amounts of antiviral cytokines, called interferons (IFNs), and pro-inflammatory cytokines, such as interleukin (IL)-1β and IL-18. IL-1β and IL-18 are synthesized as inactive precursors, pro-IL-1β and pro-IL-18, that are processed by caspase-1 in a cytoplasmic multiprotein complex, called the inflammasome. After processing, these cytokines are biologically active and will be secreted. The signals and secretory routes that activate inflammasomes and the secretion of IL-1β and IL-18 during virus infections are poorly characterized. The main goal of this thesis was to characterize influenza A virus-induced innate immune responses and host-virus interactions in human primary macrophages during an infection. Methodologically, various techniques of cellular and molecular biology, as well as proteomic tools combined with bioinformatics, were utilized. Overall, the thesis provides interesting insights into inflammatory and antiviral innate immune responses, and has characterized host-virus interactions during influenza A virus-infection in human primary macrophages.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Ewing sarcoma is an aggressive and poorly differentiated malignancy of bone and soft tissue. It primarily affects children, adolescents, and young adults, with a slight male predominance. It is characterized by a translocation between chromosomes 11 and 22 resulting in the EWSR1-FLI1fusion transcription factor. The aim of this study is to identify putative Ewing sarcoma target genes through an integrative analysis of three microarray data sets. Array comparative genomic hybridization is used to measure changes in DNA copy number, and analyzed to detect common chromosomal aberrations. mRNA and miRNA microarrays are used to measure expression of protein-coding and miRNA genes, and these results integrated with the copy number data. Chromosomal aberrations typically contain also bystanders in addition to the driving tumor suppressor and oncogenes, and integration with expression helps to identify the true targets. Correlation between expression of miRNAs and their predicted target mRNAs is also evaluated to assess the results of post-transcriptional miRNA regulation on mRNA levels. The highest frequencies of copy number gains were identified in chromosome 8, 1q, and X. Losses were most frequent in 9p21.3, which also showed an enrichment of copy number breakpoints relative to the rest of the genome. Copy number losses in 9p21.3 were found have a statistically significant effect on the expression of MTAP, but not on CDKN2A, which is a known tumor-suppressor in the same locus. MTAP was also down-regulated in the Ewing sarcoma cell lines compared to mesenchymal stem cells. Genes exhibiting elevated expression in association with copy number gains and up-regulation compared to the reference samples included DCAF7, ENO2, MTCP1, andSTK40. Differentially expressed miRNAs were detected by comparing Ewing sarcoma cell lines against mesenchymal stem cells. 21 up-regulated and 32 down-regulated miRNAs were identified, includingmiR-145, which has been previously linked to Ewing sarcoma. The EWSR1-FLI1 fusion gene represses miR-145, which in turn targets FLI1 forming a mutually repressive feedback loop. In addition higher expression linked to copy number gains and compared to mesenchymal stem cells, STK40 was also found to be a target of four different miRNAs that were all down-regulated in Ewing sarcoma cell lines compared to the reference samples. SLCO5A1 was identified as the only up-regulated gene within a frequently gained region in chromosome 8. This region was gained in over 90 % of the cell lines, and also with a higher frequency than the neighboring regions. In addition, SLCO5A1 was found to be a target of three miRNAs that were down-regulated compared to the mesenchymal stem cells.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Myeloproliferative neoplasms (MPN) and myelodysplastic syndromes (MDS) are a heterogeneous group of clonal hematopoietic disorders whose etiology and molecular pathogenesis are poorly understood. During the past decade, enormous developments in microarray technology and bioinformatics methods have made it possible to mine novel molecular alterations in a large number of malignancies, including MPN and MDS, which has facilitated the detection of new prognostic, predictive and therapeutic biomarkers for disease stratification. By applying novel microarray techniques, we profiled copy number alterations and microRNA (miRNA) expression changes in bone marrow aspirate and blood samples. In addition, we set up and validated an miRNA expression test for bone marrow core biopsies in order to utilize the large archive material available in many laboratories. We also tested JAK2 mutation status and compare it with the in vitro growth pattern of hematologic progenitors cells. In the study focusing on 100 MPN cases, we detected a Janus kinase 2 (JAK2) mutation in 71 cases. We observed spontaneous erythroid colony growth in all mutation-positive cases in addition to nine mutation negative cases. Interestingly, seven JAK2V167F negative ET cases showed spontaneous megakaryocyte colony formation, one case of which also harbored a myeloproliferative leukemia virus oncogene (MPL) mutation. We studied copy number alterations in 35 MPN and 37 MDS cases by using oligonucleotide-based array comparative hybridization (array CGH). Only one essential thrombocythemia (ET) case presented copy number alterations in chromosomes 1q and 13q. In contrast, MDS cases were characterized by numerous novel cryptic chromosomal aberrations with the most common copy number losses at 5q21.3q33.1 and 7q22.1q33, while the most common copy number gain was trisomy 8. As for the study of the bone marrow core biopsy samples, we showed that even though these samples were embedded in paraffin and underwent decalcification, they were reliable sources of miRNA and suitable for array expression analysis. Further, when studying the miRNA expression profiles of the 19 MDS cases, we found that, compared to controls, two miRNAs (one human Epstein-Barr virus (miR-BART13) miRNA and one human (has-miR-671-5p) miRNA) were downregulated, whereas two other miRNAs (hsa-miR-720 and hsa-miR-21) were upregulated. However, we could find no correlation between copy number alterations and microRNA expression when integrating these two data. This thesis brings to light new information about genomic changes implicated in the development of MPN and MDS, and also underlines the power of applying genome-wide array screening techniques in neoplasias. Rapid advances in molecular techniques and the integration of different genomic data will enable the discovery of the biological contexts of many complex disorders, including myeloid neoplasias.